Data set books contains the daily sales of paperback and hardcover books at the same store. The task is to forecast the next four days’ sales for paperback and hardcover books.
require(fpp2)
data(books)
autoplot(books) +
scale_colour_manual(values=c('darkorchid', 'darkgreen'))
hardcover<- ts(books[,2], start = 1, frequency = 7)
softbound <- ts(books[,1], start = 1, frequency = 7)
decompose(hardcover)%>%
autoplot()+
ggtitle('Decomposition of Hardbound Books')
decompose(softbound)%>%
autoplot()+
ggtitle('Decomposition of Softbound Books')
hb <- ses(hardcover, h=5)
sb <- ses(softbound, h=5)
autoplot(hb, colour ='darkgreen') +
autolayer(fitted(hb), series ='Smoothed Values') +
labs(title ='Hardbound Book Sales Forecast',
x = "Year",
y = "Daily Sales of Hardbound Books")
autoplot(sb, colour ='darkorchid') +
autolayer(fitted(sb), series ='Smoothed Values') +
labs(title ='Softbound Book Sales Forecast',
x = "Year",
y = "Daily Sales of Softbound Books")
acc_soft <- accuracy(sb)
acc_hard <-accuracy(hb)
Hardbound Root Mean Squared Error: 31.931015
Softbound Root Mean Squared Error: 33.6376868
holt_soft <- holt(softbound, seasonal = 'additive', h=4)
holt_hard <- holt(hardcover, seasonal = 'additive', h=4)
autoplot(softbound) +
autolayer(holt_soft, series = 'Holt Additive', PI=FALSE) +
labs(title ='Holt 4-Day Linear Forecast of Softbound Sales',
x = "Year",
y = "Daily Sales of Softbound Books")
autoplot(hardcover) +
autolayer(holt_soft, series = 'Holt Additive', PI=FALSE) +
labs(title ='Holt4-Day Linear Forecast of Hardbound Sales',
x = "Year",
y = "Daily Sales of Hardbound Books")
acc_holt_hard <-accuracy(holt_hard)
acc_holt_soft <-accuracy(holt_soft)
RMSE Exponential Smoothed Hardbound Sales: 31.931
RMSE Exponential Smoothed Softbound Sales: 33.638
RMSE Holt-Winters Hardbound Sales: 27.194
RMSE Holt-Winters Softbound Sales: 31.137
The Holt-Winter method does indeed reduce the root mean squared errors for both the softbound and hardbound book sales. This would suggest that there is in fact come level of trend acting on the book sales series.
fc_hh<-forecast(holt_hard)
fc_sh<-forecast(holt_soft)
fc_hes<-forecast(hb, h=4)
fc_ses<-forecast(sb, h = 4)
autoplot(softbound) +
autolayer(fc_ses, series = 'Exponential Smoothed Softbound', PI=FALSE) +
autolayer(fc_sh, series = 'Holt Linear Softbound', PI=FALSE) +
labs(title ='Comparing Holt-Linear and Exponentially Smoothed Softbound Sales',
x = "Year",
y = "Daily Sales of Softbound Books")
autoplot(hardcover) +
autolayer(fc_hes, series = 'Exponential Smoothed Softbound', PI=FALSE) +
autolayer(fc_hh, series = 'Holt Linear Softbound', PI=FALSE) +
labs(title ='Comparing Holt-Linear and Exponentially Smoothed Hardbound Sales',
x = "Year",
y = "Daily Sales of Hardbound Books")
pi_ses_s<-1.96*acc_soft[2]
pi_ses_h<-1.96*acc_hard[2]
pi_hl_s<-1.96*acc_holt_soft[2]
pi_hl_s<-1.96*acc_holt_hard[2]
l_interval_ses_s <- round(fc_ses$mean[1]-pi_ses_s,3)
l_interval_ses_h <- round(fc_hes$mean[1]-pi_ses_h,3)
l_interval_hl_s <- round(fc_sh$mean[1]-pi_hl_s,3)
l_interval_hl_h <- round(fc_hh$mean[1]-pi_hl_s,3)
u_interval_ses_s <- round(fc_ses$mean[1]+pi_ses_s,3)
u_interval_ses_h <- round(fc_hes$mean[1]+pi_ses_h,3)
u_interval_hl_s <- round(fc_sh$mean[1]+pi_hl_s,3)
u_interval_hl_h <- round(fc_hh$mean[1]+pi_hl_s,3)
| Model Lower | Model Upper | Calculated Lower | Calculated Upper | |
|---|---|---|---|---|
| SES Softbound | 138.867 | 275.352 | 141.18 | 273.04 |
| SES Hardbound | 174.78 | 304.34 | 176.975 | 302.145 |
| Holt Softbound | 143.913 | 275.021 | 156.167 | 262.766 |
| Holt Hardbound | 192.922 | 307.426 | 196.874 | 303.473 |
The intervals are not exactly the same, with the manually calculated prediction intervals. Manual intervals are a bit less consistent with the library computed interval values using the holt-linear method than they are with exponentially smoothed model version.
For this exercise use data set ukcars, the quarterly UK passenger vehicle production data from 1977Q1–2005Q1.
data("ukcars")
autoplot(ukcars, colour = 'slateblue') +
labs(title ='UKCars Data Q1 1977 - Q2 2005',
x = 'Quarter - Year',
y = 'Car Production')
The time series shows that the ‘ukcars’ data has a seasonal component, a trend, as well as some less structured periodicity and random-like variation.
ukcars%>%
stl(t.window = 13, s.window = 'periodic', robust = TRUE)%>%
autoplot()+
ggtitle("Decomposition of UK Cars Data usign 'STL' ")
fit <- stlf(ukcars, etsmodel='AAN', damped=TRUE)
stlf(ukcars, etsmodel='AAN', damped=TRUE)%>%
autoplot()+
labs(title ='UKCars Forecast using STL Damped',
x = 'Quarter - Year',
y = 'Car Production')
rmse_holt_d <- accuracy(fit)[2]
fit <- stlf(ukcars, etsmodel='AAN', damped=FALSE)
stlf(ukcars, etsmodel='AAN', damped=FALSE)%>%
autoplot()+
labs(title ='UKCars Forecast using STL Un-Damped',
x = 'Quarter - Year',
y = 'Car Production')
rmse_holt_u <- accuracy(fit)[2]
ets(ukcars)%>%
autoplot()+
labs(title ='UKCars Forecast using ETS',
x = 'Quarter - Year',
y = 'Car Production')
fit <- ets(ukcars)
rmse_ets <- accuracy(fit)[2]
STL: 23.295002
HOLT: 23.3211329
ETS: 25.2324409
The ETS method marginally provides the best RMSE of all the methods we have explored in this exercise, but the differences were absolutely minor.
fit1 <- ets(ukcars)
fit1 %>% forecast(h=4) %>%
autoplot() +
ggtitle("ETS FOrcast of UKCars Data")
forecast(fit1)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2005 Q2 427.4885 394.2576 460.7195 376.6662 478.3109
## 2005 Q3 361.3329 322.2353 400.4305 301.5383 421.1275
## 2005 Q4 404.5358 360.3437 448.7280 336.9497 472.1219
## 2006 Q1 431.8154 383.0568 480.5741 357.2455 506.3854
## 2006 Q2 427.4885 374.5571 480.4200 346.5369 508.4401
## 2006 Q3 361.3329 304.5345 418.1313 274.4672 448.1986
## 2006 Q4 404.5358 344.1174 464.9542 312.1338 496.9378
## 2007 Q1 431.8154 367.9809 495.6500 334.1890 529.4419
fit2 <- stlf(ukcars, etsmodel='AAN', damped=FALSE)
fit2 %>% forecast(h=4) %>%
autoplot() +
ggtitle("STL Un-Damped Holt Linear (AAN) Forcast of UKCars Data")
forecast(fit2)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2005 Q2 416.1915 385.7950 446.5881 369.7040 462.6791
## 2005 Q3 366.3343 328.7907 403.8780 308.9163 423.7524
## 2005 Q4 403.8032 360.2690 447.3375 337.2233 470.3831
## 2006 Q1 436.4809 387.6847 485.2771 361.8535 511.1083
## 2006 Q2 419.8568 366.3120 473.4016 337.9671 501.7465
## 2006 Q3 369.9996 312.0932 427.9060 281.4394 458.5598
## 2006 Q4 407.4685 345.5056 469.4313 312.7045 502.2325
## 2007 Q1 440.1461 374.3755 505.9167 339.5587 540.7336
fit3 <- stlf(ukcars, etsmodel='AAN', damped=TRUE)
fit3%>%forecast(h=4) %>%
autoplot() +
ggtitle("Holts Damped Forcast of UKCars Data")
forecast(fit3)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2005 Q2 415.0289 384.4576 445.6001 368.2742 461.7836
## 2005 Q3 364.2543 326.8700 401.6386 307.0799 421.4287
## 2005 Q4 400.8059 357.6702 443.9416 334.8355 466.7762
## 2006 Q1 432.5663 384.3596 480.7731 358.8404 506.2922
## 2006 Q2 415.0250 362.2312 467.8189 334.2838 495.7663
## 2006 Q3 364.2507 307.2369 421.2646 277.0556 451.4459
## 2006 Q4 400.8026 339.8596 461.7456 307.5983 494.0069
## 2007 Q1 432.5633 367.9289 497.1976 333.7136 531.4130
autoplot(ukcars)+
autolayer(forecast(fit1), series ="ETS", PI=FALSE)+
autolayer(forecast(fit2), series ="Holts Linear Un-Damped", PI=FALSE)+
autolayer(forecast(fit3), series ="Holts Linear Damped", PI=FALSE )+
ggtitle("Forecast Overlays ")+
xlab('Years')+
ylab("Car Production Levels")
Looking at the RMSE of each method, plus these charts of the predictions and the data itself the question becomes how do you choose? All of the models have approximately the same RMSE, they all have almost Idendical plots projecting forward….the question is at what value do they start and move on…it seems like the best way to choose would be to select a forcast with the tightest prediction interval around the first few predictions.
Both of the Holt Linear Damped and the Holt Linear Un-Damped are about the same, and slightly better than ETS model, so I would feel comfortable using either of them in making next quarter predictions short of some crazy chang ein the global economy or automotive markets. I think given that they are close to the same, I would choose the model with the least amount of mathematical bamboozelry and go with Un-Damped as it is a simpler model.
checkresiduals(fit2)
##
## Ljung-Box test
##
## data: Residuals from STL + ETS(A,A,N)
## Q* = 22.061, df = 4, p-value = 0.0001949
##
## Model df: 4. Total lags used: 8
In looking at the residuals, they are relatively normally distributed, and the line chart shows no specific pattern to them. Both of these things are good. However in looking at the lag/acf there is clearly some pattern unaccounted for in the residuals, seen in the wave-like form there. Also the Ljung-Box tests tells us that we are not looking at an independently distributed set of residuals. However, this should be expected given the nature of our data and the fact that we did not account for the seasonality in it prior to modeling.